Question: parsing through gtf file
0
gravatar for a.rex
21 months ago by
a.rex110
a.rex110 wrote:

I have the following gtf file layout, with the 'features' transcript (i.e. full length of the transcript) and the exons within that transcript. For example:

C7123483        cam  transcript         1       8268    .       +       .       gene_id "00001"; transcript_id "00001";
C7123483        cam    exon              1       206      .       +       .       gene_id "00001"; transcript_id "00001";
C7123483        cam    exon             263     749     .       +       .       gene_id "00001"; transcript_id "00001";

Since this file only contains the coordinates for the exons, I would also like this file to include the intron coordinates. Presumably I would have to subtract the end coordinate of the previous exon from the start coordinate of the next exon. Has anyone got any experience doing this - are there any tools to do this automatically as I am struggling to write a script?

I need to find the exon/intron coordinates as I have another bed file whose coordinates I need to match with the exon/intron/trasncript_id/gene_id information from the gtf file.

I hope this makes sense - I am very new to bioinformatics, and any help would be very much appreciated.

gene • 865 views
ADD COMMENTlink modified 21 months ago by geek_y8.6k • written 21 months ago by a.rex110
2

Look into bedtools complement. Assuming you have only exons in your files this may work. Then you can use bedtools merge to merge the two files, if you need this information in a single file.

ADD REPLYlink modified 21 months ago • written 21 months ago by genomax49k

I would suggest you to look into MISO annotation for all the possible intron annotated since it might be that between two exons an intron is not annotated as "intron" but rather can be any potential regulatory sequence (5UTR, 3UTR, snRNA etc not yet annotated..). In order to be an intron you need evidence that it is annotated based on the intron/exon junction (i.e. its expression is dependent on the flanking exons). Have a look here:

https://miso.readthedocs.io/en/fastmiso/annotation.html

furthermore you can use http://rnaseqlib.readthedocs.org/ and make you own annotation

ADD REPLYlink written 21 months ago by fusion.slope150
2
gravatar for Jeffin Rockey
21 months ago by
Jeffin Rockey480
Karimannoor
Jeffin Rockey480 wrote:

Another alternative:

There are a couple of posts giving the usage of -addintrons option of gt gff3 tool from genometools suite. That should indeed be quite useful for you.

For the second requirement,subsequently you may use bedtools intersect as well.

ADD COMMENTlink written 21 months ago by Jeffin Rockey480
1
gravatar for Marge
21 months ago by
Marge280
Italy
Marge280 wrote:

There are already multiple posts in Biostars that discuss conversion from gtf to bed, e.g.:

How To Convert Gencode Gtf Into Bed Format ?

Converting gtf format to bed format

How To Convert Hg19_Known_Gene From Text Format To Gtf Or Bed?

How to convert GTF format to BED format?

Did you try any of those solutions and in case what is not working for you?

Best, Marge

ADD COMMENTlink written 21 months ago by Marge280

These links don't answer a critical part of the original question, which is how to find intervals for the introns and include them in the same file.

ADD REPLYlink written 21 months ago by genomax49k
0
gravatar for Marge
21 months ago by
Marge280
Italy
Marge280 wrote:

Apologies for missing the point. I assumed that getting the full length and exon coordinates in bed format would automatically allow one to find which coordinates are falling in the introns.

ADD COMMENTlink written 21 months ago by Marge280

Please use ADD COMMENT/ADD REPLY when responding to existing posts. SUBMIT ANSWERS should only be used for new answers to original question.

ADD REPLYlink written 21 months ago by genomax49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour