Question

Finding arbitrary sequences in Seq objects from Biopython

0

Entering edit mode

10.0 years ago

knpayne2 • 0

Let's say I have a large database of cdna sequences in the FASTA format, and I would like to identify a motif in the corresponding amino acid sequence. Let's say I need to find something like:

CxxCxxxxxxxxxxxxHxxx$

where $ will be H or C

I imagine one would start by parsing the fasta files, find the sites where these sub-sequences have to be, then

translate the corresponding coding DNA sequence, then I end up with an amino acid sequence that contains a sequence of this form. If I had a specific amino acid sequence in mind, I could easily find it by using the .find() method in the biopython module. However, I'm not sure how one can try to identify a form like above, in which one would search for a set of motifs.

Thanks!

python biopython sequence • 4.5k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.0 years ago by knpayne2 • 0

0

Entering edit mode

The questions needs clarity at quite a few places. To start with, you have a database of sequences of which type? "FASTA" is the format, gives us nothing on the type of the underlying sequence.

ADD REPLY • link 10.0 years ago by Ram 45k

0

Entering edit mode

Sorry, these are cdna sequences that are parsed from a set of FASTA files. Then I translate the sections between the KpnI and BamHI sites. With the amino acid sequence, I then need to find a sub-sequence that matches the pattern:

CxxCxxxxxxxxxxxxHxxx$

where $ will be H or C

I hope that is more clear.

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 10.0 years ago by knpayne2 • 0

score 1 · Answer 1 · 2015-06-24

1

Entering edit mode

10.0 years ago

Asaf 10k

You can start with reading the sequences using: fain = SeqIO.parse('filename.fa', 'fasta'), then iterate the sequences: for seqrc in fain: and for each sequence translate() it and use re (regular expression) to find your pattern.

ADD COMMENT • link 10.0 years ago by Asaf 10k

0

Entering edit mode

BioPython now has a Bio.Motifs package, the MEME suite can help you scan for a list of motifs.

ADD REPLY • link 10.0 years ago by cyril-cros ▴ 950

0

Entering edit mode

Note Bio.Motif was deprecated, you'd want Bio.motifs (lower case with an s).

ADD REPLY • link 10.0 years ago by Peter 6.0k

0

Entering edit mode

Yes, I'd also have considering using a regular expression (via import re) here.

ADD REPLY • link 10.0 years ago by Peter 6.0k