Question: How to find intron co-ordinates from the gff file?
5
gravatar for Chirag Parsania
3.4 years ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:

HI 

Does anyone know how can I find intronic co-ordinates from the gff file ? My .gff file contains following features with co-ordinates. Unfortunately there's no intron co-ordinate

chromosome,gene,mRNA,exon,CDS,tRNA,retrotransposon,long_terminal_repeat  snoRNA,blocked_reading_frame,centromere,repeat_region,pseudogene,snRNA,ncRNA ,rRNA

Give your inputs

Thanks!

intron open gff bedtools • 2.4k views
ADD COMMENTlink modified 3.4 years ago by Devon Ryan88k • written 3.4 years ago by Chirag Parsania1.4k

Hi,

Your introns are between each exons within a given gene. They're not precisely labelled, however, one can find their begin/end positions since exons are clearly defined.

ADD REPLYlink written 3.4 years ago by Thibault D.510

Isn't it annoying that to find introns one has to go through a relatively complex procedure, as Devon suggests?

ADD REPLYlink written 3.4 years ago by dariober9.9k

Exactly @dariober..! I can do that in excel as well. But I wonder if somebody has already wrote some program for this. Thanks for your inputs. 

ADD REPLYlink written 3.4 years ago by Chirag Parsania1.4k
6
gravatar for Devon Ryan
3.4 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

I ended up needing to do something like this yesterday. Here is an RScript that should more or less work. Note that in my case I wanted introns by genes. If you want introns per transcript then just use intronsByTranscript(), which is much simpler.

library(GenomicFeatures)
library(rtracklayer)

gtf <- makeTxDbFromGFF("genes.gtf") #change me!
exons <- exonsBy(gtf, by="gene")

#make introns
exons <- reduce(exons)
exons <- exons[sapply(exons, length) > 1]

introns <- lapply(exons, function(x) {
    #Make a "gene" GRange object
    gr = GRanges(seqnames=seqnames(x)[1], ranges=IRanges(start=min(start(x)),
        end=max(end(x))), 
        strand=strand(x)[1])
    db = disjoin(c(x, gr))
    ints = db[countOverlaps(db, x) == 0]
    #Add an ID
    if(as.character(strand(ints)[1]) == "-") {
        ints$exon_id = c(length(ints):1)
    } else {
        ints$exon_id = c(1:length(ints))
    }
    ints
})
introns <- GRangesList(introns)
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Devon Ryan88k

Hi Devon,

Thanks for the script. I have tried to run that. Uploaded library successfully. But then I got an error that 

Error: could not find function "makeTxDbFromGFF"​

Here is the first few lines of session info 

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.26.3     GenomicFeatures_1.18.7 AnnotationDbi_1.28.2   Biobase_2.26.0         GenomicRanges_1.18.4  
 

Kindly tell me how can I update package and run function "makeTxDbFromGFF"

Chirag

ADD REPLYlink written 3.4 years ago by Chirag Parsania1.4k

HI Davon,

It worked.! I used "makeTranscriptDbFromGFF" instead of "makeTxDbFromGFF". Thank you so much.

Chirag

 

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Chirag Parsania1.4k

Hi Devon I just found this script of yours for extracting introns...the scripts is running fine but at the last step its stucked for longer and I am not getting any output..any idea why?

ADD REPLYlink written 2.1 years ago by ####180

Hi @Devon I was trying your above mentioned code and I am getting following errror:

    > exons
GRangesList object of length 37989:
$ENSG00000000003 
GRanges object with 20 ranges and 2 metadata columns:
       seqnames                 ranges strand |   exon_id       exon_name
          <Rle>              <IRanges>  <Rle> | <integer>     <character>
   [1]        X [100627109, 100629986]      - |    706021 ENSE00003730948
   [2]        X [100628670, 100629986]      - |    706022 ENSE00001459322
   [3]        X [100630759, 100630866]      - |    706023 ENSE00000868868
   [4]        X [100632063, 100632068]      - |    706024 ENSE00003731560
   [5]        X [100632485, 100632568]      - |    706025 ENSE00000401072
   ...      ...                    ...    ... .       ...             ...
  [16]        X [100635558, 100635746]      - |    706036 ENSE00003733424
  [17]        X [100636191, 100636689]      - |    706037 ENSE00001886883
  [18]        X [100636608, 100636806]      - |    706038 ENSE00001855382
  [19]        X [100636793, 100637104]      - |    706039 ENSE00001863395
  [20]        X [100639945, 100639991]      - |    706040 ENSE00001828996

...
<37988 more elements>
-------
seqinfo: 270 sequences (1 circular) from an unspecified genome; no seqlengths
> 
> introns <- lapply(exons, function(x) {
+     #Make a "gene" GRange object
+     gr = GRanges(seqnames=seqnames(x)[1], ranges=IRanges(start=min(start(x)),
+                                                          end=max(end(x))), 
+                  strand=strand(x)[1])
+     db = disjoin(c(x, gr))
+     ints = db[countOverlaps(db, x) == 0]
+     #Add an ID
+     if(as.character(strand(ints)[1]) == "-") {
+         ints$exon_id = c(length(ints):1)
+     } else {
+         ints$exon_id = c(1:length(ints))
+     }
+     ints
+ })
Error in .Method(..., deparse.level = deparse.level) : 
  number of columns for arg 2 do not match those of first arg
Called from: .Method(..., deparse.level = deparse.level)
Browse[1]>
  

Any inputs why it is so ? and how to overcome this?

ADD REPLYlink modified 22 months ago • written 22 months ago by ####180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1976 users visited in the last hour