Question: (Closed) Parse out exon for divergent primer design
gravatar for krushnach80
3.0 years ago by
krushnach80870 wrote:

I m trying to parse out exons or mature exons ,what have I done so far I have taken our exon coridnates from the gtf file and then converted them into bed file then used the same bed file coordinates to take out exon sequences from the fasta file using bedtools command getfasta

the command i used is

bedtools getfasta -fi hg19.fa -bed exon.bed -fo -exon_Seq -split

I would be glad if im using the command is correct of not.

So now I have a file with exon sequences the whole genome exon sequences , now how do I parse out the mature exon sequences ,from the file so in the file i have as such


so its just an small set from my data , now how do I decide and find the mature sequences from each gene and after lets say I have a gene with 5 exons, then I want to join the first exon and the last exon to use it downstream for downstream analysis .

So far all the above i used shell script ... I guess R wouldn't be much use ,Perl would be needed to parse how do I do ?


This below output from my coordinates bed file , as i can see that chr1 11868 12227 + exon and chr1 11871 12227 + exon and the same with respective information from the gtf file 11868 12227 exon:ENST00000456328.2:1 . from this i can understand that the respective gene has two exons , so I have used this exon coordinates and used the bedtools getfasta to take out the respective exon sequences ,now I want to join the coordinates 11868 12227 11871 12227 this will give me mature sequence , so as an example I showed this , like this it will have hundreds of exon coordinates with 1 or more than 1 exon ,for a given gene ,Now how do i parse out the mature coordinates and join the first and last exon of each gene and get a sequence .

cat coordinates.bed | head -5
chr1    11868   12227   +   exon
chr1    11868   14409   +   transcript
chr1    11868   14412   +   gene
chr1    11871   12227   +   exon
chr1    11871   14412   +   transcript
 cat annotation.bed | head -1
chr1    11868   12227   exon:ENST00000456328.2:1    .   +   HAVANA  exon    .   ID=exon:ENST00000456328.2:1;Parent=ENST00000456328.2;gene_id=ENSG00000223972.4;transcript_id=ENST00000456328.2;gene_type=pseudogene;gene_status=KNOWN;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_status=KNOWN;transcript_name=DDX11L1-002;exon_number=1;exon_id=ENSE00002234944.1;level=2;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1;tag=basic

Any help or suggestion would be highly appreciated.

sequence • 861 views
ADD COMMENTlink modified 3.0 years ago by lieven.sterck9.5k • written 3.0 years ago by krushnach80870

Hello krushnach80!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread. There is no need to delete threads.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.


ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by WouterDeCoster45k
gravatar for lieven.sterck
3.0 years ago by
VIB, Ghent, Belgium
lieven.sterck9.5k wrote:

Question answered here :

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by lieven.sterck9.5k

yes i saw that ..and i replied you as well

ADD REPLYlink written 3.0 years ago by krushnach80870


then just put this post as closed or duplicate or so ;-) I'm curious to hear how the script works out for you

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by lieven.sterck9.5k
Please log in to add an answer.
The thread is closed. No new answers may be added.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2037 users visited in the last hour