Dear All,
I am new in BioPerl. I run e-PCR version 2.3.11 on a genomic sequence, and got a output file "seq1.epcr" by following command:
./e-PCR -w9 -f 1 -m5000 test.sts WS240.genomic.fa T=3 >seq1.epcr
"seq1.epcr" has eight columns with tab and looks like:
Chr STS_name strand start end length/5000-5000 gap mismatch
I FOR_F32H2.2 + 8966315 8966961 647/5000-5000 0 0
I FOR_Y54E10BR.d - 3028477 3031091 2615/5000-5000 0 0
III FOR_B0280.1.v5 + 7133931 7135112 1182/5000-5000 0 0
Now, I want to extract the amplicon sequences in fasta format from "WS240.genomic.fa" according to STS hits result "seq1.epcr".
However, Amplicon sequences should contains:
- Only exon sequence
- If primer hits on non-exon (intron) region, take only exon sequence and write that "forward or reverse" primer hit the intron region.
Could you please tell me how can I use GFF3 annotation file in Bio::Tools::EPCR
to extract my amplicon sequences or any other methods to do the same?
Note: I have also loaded the "GFF3 and Genomic sequence" in mysql database using bp_seqfeature_load.pl
Thanks
Firoz
What about trying existing tools?
Try gff2fasta