Question: parsing through gtf file
0
gravatar for a.rex
2.1 years ago by
a.rex150
a.rex150 wrote:

I have the following gtf file layout, with the 'features' transcript (i.e. full length of the transcript) and the exons within that transcript. For example:

C7123483        cam  transcript         1       8268    .       +       .       gene_id "00001"; transcript_id "00001";
C7123483        cam    exon              1       206      .       +       .       gene_id "00001"; transcript_id "00001";
C7123483        cam    exon             263     749     .       +       .       gene_id "00001"; transcript_id "00001";

Since this file only contains the coordinates for the exons, I would also like this file to include the intron coordinates. Presumably I would have to subtract the end coordinate of the previous exon from the start coordinate of the next exon. Has anyone got any experience doing this - are there any tools to do this automatically as I am struggling to write a script?

I need to find the exon/intron coordinates as I have another bed file whose coordinates I need to match with the exon/intron/trasncript_id/gene_id information from the gtf file.

I hope this makes sense - I am very new to bioinformatics, and any help would be very much appreciated.

gene • 983 views
ADD COMMENTlink modified 2.1 years ago by geek_y8.7k • written 2.1 years ago by a.rex150
2

Look into bedtools complement. Assuming you have only exons in your files this may work. Then you can use bedtools merge to merge the two files, if you need this information in a single file.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax55k

I would suggest you to look into MISO annotation for all the possible intron annotated since it might be that between two exons an intron is not annotated as "intron" but rather can be any potential regulatory sequence (5UTR, 3UTR, snRNA etc not yet annotated..). In order to be an intron you need evidence that it is annotated based on the intron/exon junction (i.e. its expression is dependent on the flanking exons). Have a look here:

https://miso.readthedocs.io/en/fastmiso/annotation.html

furthermore you can use http://rnaseqlib.readthedocs.org/ and make you own annotation

ADD REPLYlink written 2.1 years ago by fusion.slope150
2
gravatar for Jeffin Rockey
2.1 years ago by
Jeffin Rockey690
Karimannoor
Jeffin Rockey690 wrote:

Another alternative:

There are a couple of posts giving the usage of -addintrons option of gt gff3 tool from genometools suite. That should indeed be quite useful for you.

For the second requirement,subsequently you may use bedtools intersect as well.

ADD COMMENTlink written 2.1 years ago by Jeffin Rockey690
1
gravatar for Marge
2.1 years ago by
Marge280
Italy
Marge280 wrote:

There are already multiple posts in Biostars that discuss conversion from gtf to bed, e.g.:

How To Convert Gencode Gtf Into Bed Format ?

Converting gtf format to bed format

How To Convert Hg19_Known_Gene From Text Format To Gtf Or Bed?

How to convert GTF format to BED format?

Did you try any of those solutions and in case what is not working for you?

Best, Marge

ADD COMMENTlink written 2.1 years ago by Marge280

These links don't answer a critical part of the original question, which is how to find intervals for the introns and include them in the same file.

ADD REPLYlink written 2.1 years ago by genomax55k
0
gravatar for Marge
2.1 years ago by
Marge280
Italy
Marge280 wrote:

Apologies for missing the point. I assumed that getting the full length and exon coordinates in bed format would automatically allow one to find which coordinates are falling in the introns.

ADD COMMENTlink written 2.1 years ago by Marge280

Please use ADD COMMENT/ADD REPLY when responding to existing posts. SUBMIT ANSWERS should only be used for new answers to original question.

ADD REPLYlink written 2.1 years ago by genomax55k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour