Question: How to count number of unique exons/introns (and their locations) in a GTF file?
0
gravatar for SmallChess
3.2 years ago by
SmallChess490
Australia
SmallChess490 wrote:

I have a large GTF file, I want the following information?

  • Number of unique exons
  • Number of unique introns
  • Locus of those unique exons
  • Locus of those unique introns

What'll be the best way to do that?

exons gtf • 1.6k views
ADD COMMENTlink written 3.2 years ago by SmallChess490
1

To somewhat reiterate what venu said, this depends on how one defines "unique". If you just want to get rid of exons/introns that are shared between transcripts then make a list of all exons/introns, sort it, and use uniq. If, on the other hand, you want to merge overlapping exons and thereby not have introns overlapping exons (possibly regardless of strand) then you'll want to either use bedtools or GenomicRanges in R.

ADD REPLYlink written 3.2 years ago by Devon Ryan91k

This might work for exons, but what about introns?

ADD REPLYlink written 3.2 years ago by SmallChess490

That's the point of merging exons within genes (or between them if that matters to you). In R that's reduce(), in bedtools I think you can merge something with itself.

ADD REPLYlink written 3.2 years ago by Devon Ryan91k

What do you mean by unique exon? Obviously each exon has a different locus. You mean you have duplicates in your GTF (exons with same locations) ?

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by venu6.2k

I mean different transcripts in the file would have the same exons, and they must be filtered.

ADD REPLYlink written 3.2 years ago by SmallChess490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 562 users visited in the last hour