Extracting Sequence from a FASTA file using biopython
2
0
Entering edit mode
9.2 years ago

I want to extract some base pairs from a FASTA file. There is a given sequence, say marker sequence. Every time this marker sequence occurs in the FASTA file, I want to extract n base pairs to the left of it (before it). Is it possible to do so using biopython? If so, please tell.

Thank you.

PS - I know it seems quite simple. But I am very new to python. And I have to do this using biopython only. So finding it very difficult to understand from the cookbook.

biopython fasta • 5.1k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Ram 43k

Approach that you could take:

  1. Use Bio Seq IO to parse
  2. Use regex to match sequence to marker pattern
  3. If found, use substring to extract target sequence using match index from above regex match
ADD COMMENT
0
Entering edit mode

Why not just use regex capture groups? I believe this would be much more efficient.

ADD REPLY
1
Entering edit mode

OP wishes to use BioPython. Come to think of it, the way OP has phrased the question makes me think this could possibly be an assignment question.

ADD REPLY
0
Entering edit mode

Agreed.

ADD REPLY
0
Entering edit mode

Biopython is the only package that I am acquainted with. That`s why the special emphasis on using that one.

But now, as it turns out I'll have to learn regex too.. Thank you for your help.. :)

ADD REPLY
0
Entering edit mode
9.2 years ago
Peter 6.0k

As mentioned by RamRS, you could use Bio.SeqIO for parsing the FASTA file. Or if you prefer plain strings, from Bio.SeqIO.FastaIO import SimpleFastaParser might be useful?

If you are looking for an exact substring match, both the Python string object and the Biopython Seq (sequence) object both offer a .find(...) method.

If you are looking for a more complicated pattern, then as RamRS suggested the Python regular expression library might be a good choice.

ADD COMMENT

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6