Question: Finding arbitrary sequences in Seq objects from Biopython
0
gravatar for knpayne2
3.9 years ago by
knpayne20
Canada
knpayne20 wrote:

Let's say I have a large database of cdna sequences in the FASTA format, and I would like to identify a motif in the corresponding amino acid sequence. Let's say I need to find something like:

CxxCxxxxxxxxxxxxHxxx$  where $ will be H or C

I imagine one would start by parsing the fasta files, find the sites where these sub-sequences have to be, then 

translate the corresponding coding DNA sequence, then I end up with an amino acid sequence that contains a sequence of this form. If I had a specific amino acid sequence in mind, I could easily find it by using the .find() method in the biopython module. However, I'm not sure how one can try to identify a form like above, in which one would search for a set of motifs.

 

Thanks!

biopython sequence python • 2.0k views
ADD COMMENTlink modified 3.9 years ago by Asaf5.6k • written 3.9 years ago by knpayne20

The questions needs clarity at quite a few places. To start with, you have a database of sequences of which type? "FASTA" is the format, gives us nothing on the type of the underlying sequence.

ADD REPLYlink written 3.9 years ago by RamRS21k

Sorry, these are cdna sequences that are parsed from a set of FASTA files. Then I translate the sections between the KpnI and BamHI sites. With the amino acid sequence, I then need to find a sub-sequence that matches the pattern:

CxxCxxxxxxxxxxxxHxxx$  where $ will be H or C

I hope that is more clear.

ADD REPLYlink written 3.9 years ago by knpayne20
1
gravatar for Asaf
3.9 years ago by
Asaf5.6k
Israel
Asaf5.6k wrote:

You can start with reading the sequences using: fain = SeqIO.parse('filename.fa', 'fasta'), then iterate the sequences: for seqrc in fain: and for each sequence translate() it and use re (regular expression) to find your pattern.

ADD COMMENTlink written 3.9 years ago by Asaf5.6k

BioPython now has a Bio.Motifs package, the MEME suite can help you scan for a list of motifs.

ADD REPLYlink written 3.9 years ago by cyril-cros890

Note Bio.Motif was deprecated, you'd want Bio.motifs (lower case with an s).

ADD REPLYlink written 3.9 years ago by Peter5.8k

Yes, I'd also have considering using a regular expression (via import re) here.

ADD REPLYlink written 3.9 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1334 users visited in the last hour