Question: extracting the exon sequence of genomic region
0
gravatar for firoz.imtech
5.7 years ago by
firoz.imtech50
United States
firoz.imtech50 wrote:

Dear All,

I am new in BioPerl. I run e-PCR version 2.3.11 on a genomic sequence, and got a output file "seq1.epcr" by following command:

./e-PCR -w9 -f 1 -m5000 test.sts  WS240.genomic.fa  T=3 >seq1.epcr

"seq1.epcr" has eight columns with tab and looks like:

Chr   STS_name   strand    start    end     length/5000-5000    gap  mismatch

I       FOR_F32H2.2     +       8966315 8966961 647/5000-5000   0       0

I       FOR_Y54E10BR.d  -       3028477 3031091 2615/5000-5000  0       0

III     FOR_B0280.1.v5  +       7133931 7135112 1182/5000-5000  0       0

Now, I want to extract the amplicon sequences in fasta format from "WS240.genomic.fa" according to STS hits result "seq1.epcr".

However, Amplicon sequences should contains:

(1) Only exon sequence

(2) If primer hits on non-exon (intron) region, take only exon sequence and write that "forward or reverse" primer hit the intron region.

Could you please tell me how can I use GFF3 annotation file in Bio::Tools::EPCR to extract my amplicon sequences or any other methods to do the same?

Note: I have also loaded the "GFF3 and Genomic sequence" in mysql database using "bp_seqfeature_load.pl"

Thanks

Firoz

sequence genome • 1.5k views
ADD COMMENTlink modified 3.0 years ago by Biostar ♦♦ 20 • written 5.7 years ago by firoz.imtech50
1

What about trying existing tools?

Try gff2fasta: https://github.com/minillinim/gff2fasta

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Fabio Marroni2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 788 users visited in the last hour