Question

Extracting Sequence from a FASTA file using biopython

0

Entering edit mode

9.2 years ago

priyankakukreja267 • 0

I want to extract some base pairs from a FASTA file. There is a given sequence, say marker sequence. Every time this marker sequence occurs in the FASTA file, I want to extract n base pairs to the left of it (before it). Is it possible to do so using biopython? If so, please tell.

Thank you.

PS - I know it seems quite simple. But I am very new to python. And I have to do this using biopython only. So finding it very difficult to understand from the cookbook.

biopython fasta • 5.1k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.2 years ago by priyankakukreja267 • 0

Ram · Answer 1 · 2015-02-13

0

Entering edit mode

9.2 years ago

Ram 43k

Approach that you could take:

Use Bio Seq IO to parse
Use regex to match sequence to marker pattern
If found, use substring to extract target sequence using match index from above regex match

ADD COMMENT • link 2.1 years ago by Ram 43k

0

Entering edit mode

Why not just use regex capture groups? I believe this would be much more efficient.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.2 years ago by Matt Shirley 10k

1

Entering edit mode

OP wishes to use BioPython. Come to think of it, the way OP has phrased the question makes me think this could possibly be an assignment question.

ADD REPLY • link 2.1 years ago by Ram 43k

0

Entering edit mode

Agreed.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.2 years ago by Matt Shirley 10k

0

Entering edit mode

Biopython is the only package that I am acquainted with. That`s why the special emphasis on using that one.

But now, as it turns out I'll have to learn regex too.. Thank you for your help.. :)

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.2 years ago by priyankakukreja267 • 0

Ram · Answer 2 · 2015-02-13

As mentioned by RamRS, you could use Bio.SeqIO for parsing the FASTA file. Or if you prefer plain strings, from Bio.SeqIO.FastaIO import SimpleFastaParser might be useful?

If you are looking for an exact substring match, both the Python string object and the Biopython Seq (sequence) object both offer a .find(...) method.

If you are looking for a more complicated pattern, then as RamRS suggested the Python regular expression library might be a good choice.