Question: Extracting Sequence from a FASTA file using biopython
0
gravatar for priyankakukreja267
5.2 years ago by
India
priyankakukreja2670 wrote:

I want to extract some base pairs from a FASTA file. There is a given sequence, say marker sequence. Every time this marker sequence occurs in the FASTA file, I want to extract n base pairs to the left of it (before it). Is it possible to do so using biopython? If so, please tell.

Thank you.

PS - I know it seems quite simple. But I am very new to python. And I have to do this using biopython only. So finding it very difficult to understand from the cookbook.

biopython fasta • 2.9k views
ADD COMMENTlink modified 5.2 years ago by Peter5.8k • written 5.2 years ago by priyankakukreja2670
0
gravatar for RamRS
5.2 years ago by
RamRS26k
Houston, TX
RamRS26k wrote:

Approach that you could take:

  1. Use Bio Seq IO to parse
  2. Use regex to match sequence to marker pattern
  3. If found, use substring to extract target sequence using match index from above regex match
ADD COMMENTlink written 5.2 years ago by RamRS26k

Why not just use regex capture groups? I believe this would be much more efficient. 

ADD REPLYlink written 5.2 years ago by Matt Shirley9.3k
1

OP wishes to use BioPython. Come to think of it, the way OP has phrased the question makes me think this could possibly be an assignment question.

ADD REPLYlink written 5.2 years ago by RamRS26k

Agreed. 

ADD REPLYlink written 5.2 years ago by Matt Shirley9.3k

Biopython is the only package that I am acquainted with. That`s why the special emphasis on using that one.

But now, as it turns out I`ll have to learn regex too.. Thank you for your help.. :)

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by priyankakukreja2670
0
gravatar for Peter
5.2 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

As mentioned by RamRS, you could use Bio.SeqIO for parsing the FASTA file. Or if you prefer plain strings, from Bio.SeqIO.FastaIO import SimpleFastaParser might be useful?

If you are looking for an exact substring match, both the Python string object and the Biopython Seq (sequence) object both offer a .find(...) method.

If you are looking for a more complicated pattern, then as RamRS suggested the Python regular expression library might be a good choice.

 

ADD COMMENTlink modified 5.1 years ago • written 5.2 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2319 users visited in the last hour