Off topic:Parse out exon for divergent primer design
1
0
Entering edit mode
6.3 years ago
1769mkc ★ 1.2k

I m trying to parse out exons or mature exons ,what have I done so far I have taken our exon coridnates from the gtf file and then converted them into bed file then used the same bed file coordinates to take out exon sequences from the fasta file using bedtools command getfasta

the command i used is

bedtools getfasta -fi hg19.fa -bed exon.bed -fo -exon_Seq -split

I would be glad if im using the command is correct of not.

So now I have a file with exon sequences the whole genome exon sequences , now how do I parse out the mature exon sequences ,from the file so in the file i have as such

chr1:11871-12227
AACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCA
>chr1:11873-12227
CTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCA

so its just an small set from my data , now how do I decide and find the mature sequences from each gene and after lets say I have a gene with 5 exons, then I want to join the first exon and the last exon to use it downstream for downstream analysis .

So far all the above i used shell script ... I guess R wouldn't be much use ,Perl would be needed to parse out....so how do I do ?

UPDATE

This below output from my coordinates bed file , as i can see that chr1 11868 12227 + exon and chr1 11871 12227 + exon and the same with respective information from the gtf file 11868 12227 exon:ENST00000456328.2:1 . from this i can understand that the respective gene has two exons , so I have used this exon coordinates and used the bedtools getfasta to take out the respective exon sequences ,now I want to join the coordinates 11868 12227 11871 12227 this will give me mature sequence , so as an example I showed this , like this it will have hundreds of exon coordinates with 1 or more than 1 exon ,for a given gene ,Now how do i parse out the mature coordinates and join the first and last exon of each gene and get a sequence .

cat coordinates.bed | head -5
chr1    11868   12227   +   exon
chr1    11868   14409   +   transcript
chr1    11868   14412   +   gene
chr1    11871   12227   +   exon
chr1    11871   14412   +   transcript
 cat annotation.bed | head -1
chr1    11868   12227   exon:ENST00000456328.2:1    .   +   HAVANA  exon    .   ID=exon:ENST00000456328.2:1;Parent=ENST00000456328.2;gene_id=ENSG00000223972.4;transcript_id=ENST00000456328.2;gene_type=pseudogene;gene_status=KNOWN;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_status=KNOWN;transcript_name=DDX11L1-002;exon_number=1;exon_id=ENSE00002234944.1;level=2;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1;tag=basic

Any help or suggestion would be highly appreciated.

sequence • 1.3k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6