Question: Get all intronic regions from a generic GTF file
0
gravatar for alessandro.darmiento1991
2.7 years ago by

Hello, I have to make a Java program for a college course in which I have to find possible intron retention in a given sample.

I am stuck in the initial part where, given a reference GTF file, I have to parse it and recover all intron regions from in (Making another, pruned, GTF file).

I am not getting how could I find where an intron starts and ends Thanks

java gtf • 3.0k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by alessandro.darmiento19910

Are you using a specific GTF file? Can you post a few lines? Latest GTFspec is available at this link.

ADD REPLYlink written 2.7 years ago by genomax68k

Hi thank you for the help. But I forgot to mention (I think it's quite important) that my input file contains ALL and ONLY the known exons of a human genome sample. I was thinking on computing from that file for each chromosome and then for each gene and for each transcripts (I saw the same gene can have multiple versions of itself due to splicing events) where is the exons start and end. And then compute the introns as the complementary of this.

How do you think about my algorithm?

Thanks

ADD REPLYlink written 2.7 years ago by alessandro.darmiento19910

Please use ADD REPLY/ADD COMMENT when responding to existing posts.

Introns are not complementary (not sure what sense you are saying that in). They represent the interval between two exons. e.g. Exon_1-Intron_1-Exon_2-Intron_2-Exon_3 etc. More here: https://en.wikipedia.org/wiki/Exon

Also see this thread for a nice graphic: What'S The Difference Between Cds And Orf?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by genomax68k
1
gravatar for brent_wilson
2.7 years ago by
brent_wilson100
Cofactor Genomics (St. Louis, MO)
brent_wilson100 wrote:

Hi Alessandro,

Here are a couple relevant links that may be useful:

If you can use UCSC, rather than be forced to use a GTF file: Bed File With Introns Only

A little more detail on a manual solution is here: https://biostar.usegalaxy.org/p/6453/

And some basic information on GTF parsing in Python: http://biopython.org/wiki/GFF_Parsing

It's tough to give an exact solution without seeing the file, but hopefully this is useful. Good luck!

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENTlink written 2.7 years ago by brent_wilson100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1599 users visited in the last hour