Question: (Closed) Parse out exon for divergent primer design
gravatar for krushnach80
11 months ago by
krushnach80440 wrote:

I m trying to parse out exons or mature exons ,what have I done so far I have taken our exon coridnates from the gtf file and then converted them into bed file then used the same bed file coordinates to take out exon sequences from the fasta file using bedtools command getfasta

the command i used is

bedtools getfasta -fi hg19.fa -bed exon.bed -fo -exon_Seq -split

I would be glad if im using the command is correct of not.

So now I have a file with exon sequences the whole genome exon sequences , now how do I parse out the mature exon sequences ,from the file so in the file i have as such


so its just an small set from my data , now how do I decide and find the mature sequences from each gene and after lets say I have a gene with 5 exons, then I want to join the first exon and the last exon to use it downstream for downstream analysis .

So far all the above i used shell script ... I guess R wouldn't be much use ,Perl would be needed to parse how do I do ?


This below output from my coordinates bed file , as i can see that chr1 11868 12227 + exon and chr1 11871 12227 + exon and the same with respective information from the gtf file 11868 12227 exon:ENST00000456328.2:1 . from this i can understand that the respective gene has two exons , so I have used this exon coordinates and used the bedtools getfasta to take out the respective exon sequences ,now I want to join the coordinates 11868 12227 11871 12227 this will give me mature sequence , so as an example I showed this , like this it will have hundreds of exon coordinates with 1 or more than 1 exon ,for a given gene ,Now how do i parse out the mature coordinates and join the first and last exon of each gene and get a sequence .

cat coordinates.bed | head -5
chr1    11868   12227   +   exon
chr1    11868   14409   +   transcript
chr1    11868   14412   +   gene
chr1    11871   12227   +   exon
chr1    11871   14412   +   transcript
 cat annotation.bed | head -1
chr1    11868   12227   exon:ENST00000456328.2:1    .   +   HAVANA  exon    .   ID=exon:ENST00000456328.2:1;Parent=ENST00000456328.2;gene_id=ENSG00000223972.4;transcript_id=ENST00000456328.2;gene_type=pseudogene;gene_status=KNOWN;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_status=KNOWN;transcript_name=DDX11L1-002;exon_number=1;exon_id=ENSE00002234944.1;level=2;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1;tag=basic

Any help or suggestion would be highly appreciated.

sequence • 340 views
ADD COMMENTlink modified 10 months ago by lieven.sterck3.3k • written 11 months ago by krushnach80440

Hello krushnach80!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread. There is no need to delete threads.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.


ADD REPLYlink modified 10 months ago • written 10 months ago by WouterDeCoster35k
gravatar for lieven.sterck
10 months ago by
VIB, Ghent, Belgium
lieven.sterck3.3k wrote:

Question answered here :

ADD COMMENTlink modified 10 months ago • written 10 months ago by lieven.sterck3.3k

yes i saw that ..and i replied you as well

ADD REPLYlink written 10 months ago by krushnach80440


then just put this post as closed or duplicate or so ;-) I'm curious to hear how the script works out for you

ADD REPLYlink modified 10 months ago • written 10 months ago by lieven.sterck3.3k
Please log in to add an answer.
The thread is closed. No new answers may be added.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1421 users visited in the last hour