Question: How to deal with the spaces in the sequence names with biopython?
0
gravatar for grayapply2009
4.0 years ago by
grayapply2009170
United States
grayapply2009170 wrote:

I have a fasta file formatted as follows:

>UPF0471 protein C1orf63 homolog

some sequence

>WD repeat-containing protein 43

some sequence

>transmembrane protein 41A

some sequence

When I print out record.id or make dictionaries, biopython cannot handle the spaces in the sequence names. What should I do to let biopython recognize the name as whole rather than just taking the first word of the name?

biopython space name sequence • 1.2k views
ADD COMMENTlink modified 4.0 years ago by Peter5.8k • written 4.0 years ago by grayapply2009170

Replace the spaces with "_" or "-"?

ADD REPLYlink written 4.0 years ago by pld4.8k

You'll find most tools will take the same attitude to spaces and FASTA identifiers, so good idea!

ADD REPLYlink written 4.0 years ago by Peter5.8k
2
gravatar for Damian Kao
4.0 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You can get the whole header by using record.description

ADD COMMENTlink written 4.0 years ago by Damian Kao15k
1
gravatar for Peter
4.0 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

Answering your second question, how to make a dictionary using SeqIO.to_dict with the full descriptions with spaces as keys - you would need to use the key_function as help(to_dict) tries to explain, e.g.

my_dict = to_dict(sequences, key_function=lambda rec: rec.description)

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Peter5.8k
0
gravatar for grayapply2009
4.0 years ago by
grayapply2009170
United States
grayapply2009170 wrote:

Then how do I make dictionaries with SeqIO.to_dict?

ADD COMMENTlink written 4.0 years ago by grayapply2009170

This isn't an answer - it is a new question, or an addendum to your old question?

ADD REPLYlink written 4.0 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour