Question

extracting the exon sequence of genomic region

0

Entering edit mode

9.8 years ago

firoz.imtech ▴ 50

Dear All,

I am new in BioPerl. I run e-PCR version 2.3.11 on a genomic sequence, and got a output file "seq1.epcr" by following command:

./e-PCR -w9 -f 1 -m5000 test.sts  WS240.genomic.fa  T=3 >seq1.epcr

"seq1.epcr" has eight columns with tab and looks like:

Chr   STS_name   strand    start    end     length/5000-5000    gap  mismatch
I       FOR_F32H2.2     +       8966315 8966961 647/5000-5000   0       0
I       FOR_Y54E10BR.d  -       3028477 3031091 2615/5000-5000  0       0
III     FOR_B0280.1.v5  +       7133931 7135112 1182/5000-5000  0       0

Now, I want to extract the amplicon sequences in fasta format from "WS240.genomic.fa" according to STS hits result "seq1.epcr".

However, Amplicon sequences should contains:

Only exon sequence
If primer hits on non-exon (intron) region, take only exon sequence and write that "forward or reverse" primer hit the intron region.

Could you please tell me how can I use GFF3 annotation file in Bio::Tools::EPCR to extract my amplicon sequences or any other methods to do the same?

Note: I have also loaded the "GFF3 and Genomic sequence" in mysql database using bp_seqfeature_load.pl

Thanks
Firoz

genome sequence • 2.2k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by firoz.imtech ▴ 50

1

Entering edit mode

What about trying existing tools?

Try gff2fasta

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Fabio Marroni ★ 3.0k